Capturing asymmetry in COVID-19 counts using an improved skewness measure for time series data

Graphical abstract


Introduction
Since the outbreak of the novel coronavirus (COVID-19), there has been abundant literature focusing on analyzing the different perspectives and features of the virus.The impact of this global pandemic is so huge that till date there have been over 210 million cases recorded as per the WHO.There is thus an ever growing need of such analyses, to monitor the path of the pandemic.Moreover, numerous statistical approaches have been documented in this regard which are capable of predicting or forecasting the case totals through an underlying model.To list a few, [7] proposed a statistical model to analyze the virus spread specifically in Spain and Italy.Sarkar et al. [10] adopted a similar approach to study the virus spread in India.Arora et al. [3] designed a deep learning approach to predict the number of positive cases, whereas [12] proposed logistic and machine learning models for prediction.A general idea about the awareness of this issue among people can be gauged via the number of "country specific " analyses undertaken by many authors.Just to name a couple, [11] introduced a stochastic modeling approach to analyze the virus prevalence in East African countries or [9] analyzed the predictive modeling of confirmed cases in Nigeria.
Among all these, a specific aspect of analysis is the amount of symmetry or asymmetry present in the daily case numbers.Of course this amounts to analyzing the impact of skewness in such data.A largely asymmetric data distribution clearly poses multiple problems in estimation, be it point estimation or intervals.Typically a 95% interval for the average daily counts in such situations would present a false picture about the true coverage level.Further, obtaining an idea about the skewness also indicates the direction of deviations from the mean.The skewness measure finds its use predominantly in the finance literature, say to describe the returns of a stock.However there have been some attempts made to connect this idea to the current COVID-19 modeling.Particularly, a good reference paper is by Akhtar [2] , where the pandemic curves are analyzed through a probability density function with associated skewness and kurtosis measures.Our focus in this paper will be to adopt a time series approach to model the number of daily cases, and propose a new measure of skewness which proves to be better than the classical one.In literature there have been several papers introducing new measures of skewness but clearly, all these may not be easily applicable to a time series data.A few references in this regard are that of [6] who proposed a robust measure of skewness or [1] , who presented a skewness measure termed as the "split sample skewness ".From a time series perspective, a paper by Bai and Ng [4] aims to present the sampling distributions for the coefficient of skewness and kurtosis along with a joint test of normality for time series observations.
Following is an outline of this paper: Section 'A proposed measure of skewness' presents the proposed measure of skewness along with a few interesting properties for the same.Section 'Simulations and comparisons' gives a simulation analysis by adopting a certain bootstrap methodology to compare the powers of the discussed skewness measures, whereas we present a real data example pertaining to modeling the daily covid numbers for a batch of selected countries in Section 'Real data analysis'.Brief conclusions are included in the 5th Section.

A proposed measure of skewness
Let {   }   =1 be a time series with mean  and standard deviation .Further let   = [(  − )  ] be the  th central moment of   with  2 =  2 .The classical measure of skewness (denoted by  hereafter) which has been implemented even to time series is given by, We now propose a new measure of skewness, where the focus is more on time series which are integrated with a certain order (say ).In other words, series which are non-stationary and have a certain trend component.The idea behind its design stems from Erdem et al. [8] and Bapat [5] .We denote the new measure by   , which is defined as, where, ∇ is the usual differencing operator and in general is given by, Here,  = 0 , 1 , 2 , … will depend on the order of integration of the series.Specifically, just as an illustration, ∇   =   −   −1 and ∇ 2   =   − 2   −1 +   −2 and so on.Ideally, to capture majority of the real world series a reasonable degree of trend is either 0,1,2 or 3 which represents stationary, linear, quadratic and cubic trends.Our priority in this paper will be to elaborate on the skewness measures  1 and  2 .In the next section, we will present a variety of integrated time series models by considering different distributions for the innovations.Now again for convenience and completeness, following are specific skewness measures pertaining to different orders of integration , and are given by: and, The following subsection provides a few Lemmas associated with the newly proposed measure of skewness   .For brevity alone, we will restrict ourselves to  1 .One can easily extend these for higher values of .
and analogously  1 equals 0. □ The following Lemmas outline properties of  1 under non-stationarity and cointegration respectively.
Lemma 2.2.Let {   }   =1 represent a first-order random walk given by: where   is a Gaussian white noise sequence with mean 0 and variance 1.Then,  1 = 0 .
Proof.If we simulate a large number of   sequences, we clearly have: And hence,  1 = 0 .□ Lemma 2.3.Let {   }   =1 and {   }   =1 be two cointegrated processes given by, where   and   are independent Gaussian white noise sequences with means 0 and variances 1 and  is a constant.Then, skewness of   = 0 .

Simulations and comparisons
We will now present results for a number of different time series processes, with varying orders of integration.Towards this, a specific bootstrap test for the skewness measure will be established.This approach will be along the lines of that given by Adil et al. [9].Now since we are dealing with time series, a usual way to introduce a symmetry or asymmetry among the series is to fix distributions for the innovation terms in each case.Specifically, for inducing symmetry we consider (0 , 1) and  5 distributions.Whereas, to induce asymmetry in the series, we consider  2  5 and  (LN) distributions.Further, we consider several ARIMA models integrated with orders of either 1 or 2, along with the above mentioned innovation distributions.We fix the AR parameters as 0.7 and 0.2 in each case.Table 1 outlines the time series processes considered for analysis.

Table 1
Processes under consideration.

Model
Innovation distribution Clearly, series 1,2,3 and 4 represent a symmetric series whereas 5 an 6 are asymmetric.The next subsection outlines the said bootstrap test along with a power comparison methodology.

Bootstrap test and power comparisons
As seen in the previous section, our newly proposed skewness measure also takes a value 0 when the underlying series is symmetric.We hence design an appropriate bootstrap test for skewness, which happens to be robust towards the choice of critical values and the initial sample.Without loss of generality, we assume that a series {   }   =1 is symmetric around  , where  is the general central parameter (either mean or median).The null hypothesis under consideration and pertaining to symmetry can be expressed as, where  stands for the distribution of the series   , and  denotes a generalization of   .Now, a suitable bootstrap approach involves the following steps: Step If the observed test statistic falls in the 95% interval, the hypothesis of symmetry will be accepted or otherwise rejected.Now, in order to compare the powers of the existing skewness measure ( ) with the newly proposed measures (  1 ,  2 ) we adopt the following strategy: • Apply the bootstrap algorithm outlined above, to get the required sorted test statistic values along with the quantiles.
• Simulate the underlying series along with the corresponding error distribution a fixed number of times (say 10,000).
• If the error distribution is assumed to be symmetric, count the percentage of correct acceptances of the null hypothesis, which will be the power of the test.Similarly, if the error distribution is assumed to be asymmetric, count the percentage of correct rejections of the null hypotheses, which will be the power of the test in that case.
We apply the above bootstrap algorithm along with the power comparison to all the models listed in Table 1 .These analyses are outlined in Table 2 .We also report the means of the calculated skewness measures ( γ, { τ1 , τ2 }) obtained from the 10,000 simulated series.Ideally, if a series is assumed to be symmetric, the average estimated skewness measure should be close to 0 and if a series is assumed to be asymmetric, it should be away from 0. The powers of the three measures are denoted by  1 ,  2 and  3 respectively.Clearly, we will only compare γ and τ1 if the order of integration is 1, and all three measures if the order of integration is 2. For illustration purposes, Fig. 1 contains side-by-side plots of the obtained estimates under the usual and the newly proposed skewness measures, for the    (2 , 1 , 0) model under  5 and (0 , 1) innovations.As seen from the figure, the estimates are much tighter around 0, for the newly proposed skewness measure which suggests that it represents a symmetric distribution more adequately.Now as an assistance, we provide a point wise description of the entire methodology, which will help anyone to navigate through the process smoothly.We adopt the following steps to find and compare the new skewness measure with the usual ones.
• Assuming a time series {   }   =1 is non-stationary (integrated with an order ), we take the th difference of   and find the new measure of skewness (   ) given in Eq. ( 2) .
• We also find the usual measure of skewness ( ) given in Eq. ( 1) .
• We repeat the above steps for simulated series generated from several    models, with innovations having different distributions denoting symmetricity or asymmetricity.• To compare the two measures of skewness, we apply the bootstrap methodology outlined in Section 'Bootstrap test and power comparisons'.This involves comparing the average estimates over a set number of simulations, along with comparing the powers of both the test statistics.• It is believed that if a series is indeed symmetric, the skewness measure should hover around 0, and farther away from 0 otherwise.

Real data analysis
We now explore the practicality and advantages of our newly proposed skewness measure (   ) over the existing skewness measure ( ) on understanding the COVID-19 pandemic behavior towards being symmetric or highly asymmetric.Intuitively, for most countries, lower daily case counts (DCC) amount to higher frequencies.Hence such DCC will resemble an asymmetric or more specifically a right-skewed pattern.In this analysis, we will only focus on the DCC for a group of selected countries across the world.The countries in this group are Belize, British Virgin Islands, China, Dominica and Lesotho, where their DCC are seen to be highly right skewed (asymmetric) with a skewness value of atleast 10.We look at data from January 3, 2020 to August 20, 2021.Fig. 2 showcases the respective histograms and raw DCC plots for each of these countries.From these, interestingly, one can note that in British Virgin Islands and Dominica, the surge in DCC happened much later in time as compared to other countries.Of course the primary reason behind this being that both these countries are island nations situated in the Caribbean which might have restricted the spread of the virus.Also, all the histograms clearly show a tendency of being asymmetric.Further for convenience, Table 3 contains some descriptive summary statistics for DCC of the above countries.Now before we compare the asymmetry through the different skewness measures, we fit a suitable time series model to the DCC of these countries.The .function in the   package of  was used to obtain the best fit.For a comparison, Table 4 outlines these results.Again interestingly, all the models exhibit an integration of order 1.In order to compare the powers of the two skewness quantities, we again adopt the bootstrap technique mentioned in Section 'Simulations and comparisons', and a brief outline is as follows: for a particular country, we look at its DCC and assuming that it is symmetric around its mean, construct 2T linear transformations as given in Step 2 above.We further execute Steps 3 and 4. Lastly to find out the power, we consider the observed DCC series.One should note that in this case, since there is a single iteration of the observed series, the power could be either 0 or 1.Table 5 contains the skewness summaries along with the corresponding powers for each country.Now since all the DCC are integrated with order 1, we only compare  and  1 .As one can note, for all the countries,
1 Let  * be an appropriate test statistic which measures skewness.The null hypothesis will be rejected if  * is significantly different from 0. Step 2 Construct 2  linear transformations of   which give rise to transformed series {   }   =1 and {   }   =1 as follows: 1 =  1 − ,  2 =  2 − , … ,   =   −  and  1 =  −  1 ,  2 =  −  2 , … ,   =  −   .Finallylet  be a collective sample of the transformed series.This way, the initial series is symmetrized around  .Step 3 Generate a bootstrapped series  * = {  1 ,  2 , … ,   } , where the elements are drawn from .Then find out the value of the test statistic  * (  * ) .Repeat the bootstrap sampling say 10,000 times and get a collection of corresponding  * values.Step 4 Order the test statistic values (  * 1 <  * 2 < … <  * 10 , 000 ) and find out the upper and lower 2 .5% thresholds for this collection of  * values.Denote them by  * 25 and  * 975 respectively.

Table 2
Power comparisons for the underlying series.

Table 3
Summary statistics for the daily case counts.